AITopics

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Neural Information Processing SystemsJun-13-2026, 08:36:57 GMT

Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals

artificial intelligence, machine learning, proceedings, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 02:17:37 GMT

LearningPhysicalConstraintswith NeuralProjections

How does a human being distinguish the motions of a piece of paper and a piece of cloth? A high-school physics teacher might answer that they are both tangentially inextensible but cloth cannot resist any bending force from the normal direction.

artificial intelligence, constraint, machine learning, (19 more...)

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Education > Curriculum > Subject-Specific Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

arXiv.org Artificial IntelligenceNov-27-2025

Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals

Gillman, Nate, Herrmann, Charles, Freeman, Michael, Aggarwal, Daksh, Luo, Evan, Sun, Deqing, Sun, Chen

Recent advances in video generation models have sparked interest in world models capable of simulating realistic environments. While navigation has been well-explored, physically meaningful interactions that mimic real-world forces remain largely understudied. In this work, we investigate using physical forces as a control signal for video generation and propose force prompts which enable users to interact with images through both localized point forces, such as poking a plant, and global wind force fields, such as wind blowing on fabric. We demonstrate that these force prompts can enable videos to respond realistically to physical control signals by leveraging the visual and motion prior in the original pretrained model, without using any 3D asset or physics simulator at inference. The primary challenge of force prompting is the difficulty in obtaining high quality paired force-video training data, both in the real world due to the difficulty of obtaining force signals, and in synthetic data due to limitations in the visual quality and domain diversity of physics simulators. Our key finding is that video generation models can generalize remarkably well when adapted to follow physical force conditioning from videos synthesized by Blender, even with limited demonstrations of few objects. Our method can generate videos which simulate forces across diverse geometries, settings, and materials. We also try to understand the source of this generalization and perform ablations that reveal two key elements: visual diversity and the use of specific text keywords during training. Our approach is trained on only around 15k training examples for a single day on four A100 GPUs, and outperforms existing methods on force adherence and physics realism, bringing world models closer to real-world physics interactions. We release all datasets, code, weights, and interactive video demos at our project page.

artificial intelligence, inductive learning, machine learning, (14 more...)

2505.19386

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Neural Information Processing SystemsNov-21-2025, 15:48:23 GMT

Visual Interaction Networks: Learning a Physics Simulator from Video

From just a glance, humans can make rich predictions about the future of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains or require information about the underlying state. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments.

name change, physics simulator, visual interaction network, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Kwon, Minseo, Kim, Young J.

Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling

arXiv.org Artificial IntelligenceOct-31-2025

Abstract-- T ask and Motion Planning (T AMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic T AMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the T AMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a T AMP solution and backtracks the search based on visual rendering of the states. I. INTRODUCTION Robotic manipulation tasks, such as tabletop manipulations, require reasoning over both symbolic task decisions and continuous geometric feasibility. A robot must decide which action to perform--such as picking, placing, or stacking-- and which object to grasp, which constitutes a discrete search process. Simultaneously, it must determine grasp poses, feasible end-effector configurations, and collision-free motion trajectories governed by continuous constraints. This class of problems is studied under the framework of Task and Motion Planning (i.e., T AMP), which combines high-level task planning with continuous action parameter binding and low-level motion planning [1], [2].

artificial intelligence, constraint, task and motion planning, (14 more...)

2510.26139

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)

Neural Information Processing SystemsOct-2-2025, 16:23:39 GMT

Learning Physical Constraints with Neural Projections Shuqi Yang

We propose a new family of neural networks to predict the behaviors of physical systems by learning their underpinning constraints.

artificial intelligence, constraint, machine learning, (18 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Joyce, Eric C., Zhao, Qianwen, Burgdorfer, Nathaniel, Wang, Long, Mordohai, Philippos

Consensus-Driven Uncertainty for Robotic Grasping based on RGB Perception

arXiv.org Artificial IntelligenceJun-27-2025

--Deep object pose estimators are notoriously overconfident. A grasping agent that both estimates the 6-DoF pose of a target object and predicts the uncertainty of its own estimate could avoid task failure by choosing not to act under high uncertainty. Even though object pose estimation improves and uncertainty quantification research continues to make strides, few studies have connected them to the downstream task of robotic grasping. We propose a method for training lightweight, deep networks to predict whether a grasp guided by an image-based pose estimate will succeed before that grasp is attempted. We generate training data for our networks via object pose estimation on real images and simulated grasping. We also find that, despite high object variability in grasping trials, networks benefit from training on all objects jointly, suggesting that a diverse variety of objects can nevertheless contribute to the same goal. Remarkable progress in object pose estimation from single RGB images has been made in the past few years [1]-[4], primarily driven by deep learning and the ability to reduce the so-called sim2real gap . This has enabled end-to-end system training on large amounts of synthetic data with precise ground truth. Consider for example the pose estimates illustrated in Figure 1. These were made by current methods, yet all four caused grasping attempts to fail when used as guides. Motivated by this disconnect between pose evaluation and success in downstream grasping, we propose an approach to estimate the likelihood for success before a grasp is actually attempted.

artificial intelligence, gripper, machine learning, (15 more...)

2506.20045

Country:

North America > United States > Utah (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Samplawski, Colin, Cobb, Adam D., Jha, Susmit

AGENT: An Aerial Vehicle Generation and Design Tool Using Large Language Models

arXiv.org Artificial IntelligenceApr-15-2025

Computer-aided design (CAD) is a promising application area for emerging artificial intelligence methods. Traditional workflows for cyberphysical systems create detailed digital models which can be evaluated by physics simulators in order to narrow the search space before creating physical prototypes. A major bottleneck of this approach is that the simulators are often computationally expensive and slow. Recent advancements in AI methods offer the possibility to accelerate these pipelines. We use the recently released AircraftVerse dataset, which is especially suited for developing and evaluating large language models for designs. AircraftVerse contains a diverse set of UAV designs represented via textual design trees together with detailed physics simulation results. Following the recent success of large language models (LLMs), we propose AGENT (Aircraft GENeraTor). AGENT is a comprehensive design tool built on the CodeT5+ LLM which learns powerful representations of aircraft textual designs directly from JSON files. We develop a curriculum of training tasks which imbues a single model with a suite of useful features. AGENT is able to generate designs conditioned on properties of flight dynamics (hover time, maximum speed, etc.). Additionally, AGENT can issue evaluations of designs allowing it to act as a surrogate model of the physics simulation that underlies the AircraftVerse dataset. We present a series of experiments which demonstrate our system's abilities. We are able to achieve strong performance using the smallest member of the CodeT5+ family (220M parameters). This allows for a flexible and powerful system which can be executed on a single GPU enabling a clear path toward future deployment.

large language model, machine learning, natural language, (17 more...)

2504.08981

Genre: Research Report (0.64)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.34)

arXiv.org Artificial IntelligenceMar-8-2025

TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Sun, Haowei, Yan, Xintao, Qiao, Zhijie, Zhu, Haojie, Sun, Yihao, Wang, Jiawei, Shen, Shengyin, Hogue, Darian, Ananta, Rajanikant, Johnson, Derek, Stevens, Greg, McGuire, Greg, Wei, Yifan, Zheng, Wei, Sun, Yong, Fukai, Yasuo, Liu, Henry X.

Traffic simulation is essential for autonomous vehicle (AV) development, enabling comprehensive safety evaluation across diverse driving conditions. However, traditional rule-based simulators struggle to capture complex human interactions, while data-driven approaches often fail to maintain long-term behavioral realism or generate diverse safety-critical events. To address these challenges, we propose TeraSim, an open-source, high-fidelity traffic simulation platform designed to uncover unknown unsafe events and efficiently estimate AV statistical performance metrics, such as crash rates. TeraSim is designed for seamless integration with third-party physics simulators and standalone AV stacks, to construct a complete AV simulation system. Experimental results demonstrate its effectiveness in generating diverse safety-critical events involving both static and dynamic agents, identifying hidden deficiencies in AV systems, and enabling statistical performance evaluation. These findings highlight TeraSim's potential as a practical tool for AV safety assessment, benefiting researchers, developers, and policymakers. The code is available at https://github.com/mcity/TeraSim.

simulation, simulator, terasim, (16 more...)

2503.03629

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Michigan > Wayne County > Plymouth (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (0.95)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.86)